AITopics | future frame

Collaborating Authors

future frame

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Probabilistic Modeling of Future Frames from a Single Image

Tianfan Xue, Jiajun Wu, Katherine Bouman, Bill Freeman

Neural Information Processing SystemsMar-22-2026, 23:50:00 GMT

We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, as well as on real-world video frames. We also show that our model can be applied to visual analogy-making, and present an analysis of the learned network representations.

artificial intelligence, future frame, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

Neural Information Processing SystemsMar-17-2026, 06:33:25 GMT

We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or non-parametric way, we propose a novel approach which models future frames in a probabilistic manner. Our proposed method is therefore able to synthesize multiple possible next frames using the same model. Solving this challenging problem involves low-and high-level image and motion understanding for successful image synthesis. Here, we propose a novel network structure, namely a Cross Convolutional Network, that encodes images as feature maps and motion information as convolutional kernels to aid in synthesizing future frames. In experiments, our model performs well on both synthetic data, such as 2D shapes and animated game sprites, as well as on real-wold video data. We show that our model can also be applied to tasks such as visual analogy-making, and present analysis of the learned network representations.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Siamese Masked Autoencoders

Neural Information Processing SystemsFeb-15-2026, 12:44:53 GMT

SiamMAE operates on pairs of randomly sampled video frames and asymmetrically masks them.

artificial intelligence, machine learning, natural language, (11 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction

Yunji Kim, Seonghyeon Nam, In Cho, Seon Joo Kim

Neural Information Processing SystemsFeb-12-2026, 17:57:18 GMT

Neural Information Processing Systems http://nips.cc/

dataset, keypoint, video, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MCVD - Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Neural Information Processing SystemsFeb-5-2026, 17:00:30 GMT

Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames.

artificial intelligence, machine learning, masked conditional video diffusion, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

Siamese Masked Autoencoders

Neural Information Processing SystemsDec-26-2025, 05:48:53 GMT

Establishing correspondence between images or scenes is a significant challenge in computer vision, especially given occlusions, viewpoint changes, and varying object appearances.

electronic proceedings, name change, siamese masked autoencoder, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.58)

Add feedback

Distilling Future Temporal Knowledge with Masked Feature Reconstruction for 3D Object Detection

Zheng, Haowen, Zhu, Hu, Deng, Lu, Gu, Weihao, Yang, Yang, Liang, Yanyan

arXiv.org Artificial IntelligenceDec-10-2025

Camera-based temporal 3D object detection has shown impressive results in autonomous driving, with offline models improving accuracy by using future frames. Knowledge distillation (KD) can be an appealing framework for transferring rich information from offline models to online models. However, existing KD methods overlook future frames, as they mainly focus on spatial feature distillation under strict frame alignment or on temporal relational distillation, thereby making it challenging for online models to effectively learn future knowledge. To this end, we propose a sparse query-based approach, Future Temporal Knowledge Distillation (FTKD), which effectively transfers future frame knowledge from an offline teacher model to an online student model. Specifically, we present a future-aware feature reconstruction strategy to encourage the student model to capture future features without strict frame alignment. In addition, we further introduce future-guided logit distillation to leverage the teacher's stable foreground and background context. FTKD is applied to two high-performing 3D object detection baselines, achieving up to 1.3 mAP and 1.3 NDS gains on the nuScenes dataset, as well as the most accurate velocity estimation, without increasing inference cost.

artificial intelligence, distillation, temporal reasoning, (17 more...)

arXiv.org Artificial Intelligence

2512.08247

Genre: Research Report (0.82)

Industry:

Education (1.00)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.62)

Add feedback

This AI Model Can Intuit How the Physical World Works

WIREDDec-7-2025

As the engineers who build self-driving cars know, it can be hard to get an AI system to reliably make sense of what it sees. Most systems designed to "understand" videos in order to either classify their content ("a person playing tennis," for example) or identify the contours of an object--say, a car up ahead--work in what's called "pixel space." The model essentially treats every pixel in a video as equal in importance. But these pixel-space models come with limitations. Imagine trying to make sense of a suburban street. If the scene has cars, traffic lights and trees, the model might focus too much on irrelevant details such as the motion of the leaves. It might miss the color of the traffic light, or the positions of nearby cars. "When you go to images or video, you don't want to work in [pixel] space because there are too many details you don't want to model," said Randall Balestriero, a computer scientist at Brown University. Yann LeCun, a computer scientist at New York University and the director of AI research at Meta, created JEPA, a predecessor to V-JEPA that works on still images, in 2022.

artificial intelligence, machine learning, video, (17 more...)

WIRED

Country:

North America > United States > New York (0.24)
North America > United States > California (0.04)
Europe > Slovakia (0.04)
(2 more...)

Industry: Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Light Future: Multimodal Action Frame Prediction via InstructPix2Pix

Zhong, Zesen, Zhang, Duomin, Li, Yijia

arXiv.org Artificial IntelligenceNov-5-2025

Predicting future motion trajectories is a critical capability across domains such as robotics, autonomous systems, and human activity forecasting, enabling safer and more intelligent decision-making. This paper proposes a novel, efficient, and lightweight approach for robot action prediction, offering significantly reduced computational cost and inference latency compared to conventional video prediction models. Importantly, it pioneers the adaptation of the InstructPix2Pix model for forecasting future visual frames in robotic tasks, extending its utility beyond static image editing. We implement a deep learning-based visual prediction framework that forecasts what a robot will observe 100 frames (10 seconds) into the future, given a current image and a textual instruction. We repurpose and fine-tune the InstructPix2Pix model to accept both visual and textual inputs, enabling multimodal future frame prediction. Experiments on the RoboTWin dataset (generated based on real-world scenarios) demonstrate that our method achieves superior SSIM and PSNR compared to state-of-the-art baselines in robot action prediction tasks. Unlike conventional video prediction models that require multiple input frames, heavy computation, and slow inference latency, our approach only needs a single image and a text prompt as input. This lightweight design enables faster inference, reduced GPU demands, and flexible multimodal control, particularly valuable for applications like robotics and sports motion trajectory analytics, where motion trajectory precision is prioritized over visual fidelity.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.14809

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models

Wang, Sen, Tian, Jingyi, Wang, Le, Liao, Zhimin, Li, Jiayi, Dong, Huaiyi, Xia, Kun, Zhou, Sanping, Tang, Wei, Gang, Hua

arXiv.org Artificial IntelligenceOct-22-2025

World models allow agents to simulate the consequences of actions in imagined environments for planning, control, and long-horizon decision-making. However, existing autoregressive world models struggle with visually coherent predictions due to disrupted spatial structure, inefficient decoding, and inadequate motion modeling. In response, we propose \textbf{S}cale-wise \textbf{A}utoregression with \textbf{M}otion \textbf{P}r\textbf{O}mpt (\textbf{SAMPO}), a hybrid framework that combines visual autoregressive modeling for intra-frame generation with causal modeling for next-frame generation. Specifically, SAMPO integrates temporal causal decoding with bidirectional spatial attention, which preserves spatial locality and supports parallel decoding within each scale. This design significantly enhances both temporal consistency and rollout efficiency. To further improve dynamic scene understanding, we devise an asymmetric multi-scale tokenizer that preserves spatial details in observed frames and extracts compact dynamic representations for future frames, optimizing both memory usage and model performance. Additionally, we introduce a trajectory-aware motion prompt module that injects spatiotemporal cues about object and robot trajectories, focusing attention on dynamic regions and improving temporal consistency and physical realism. Extensive experiments show that SAMPO achieves competitive performance in action-conditioned video prediction and model-based control, improving generation quality with 4.4$\times$ faster inference. We also evaluate SAMPO's zero-shot generalization and scaling behavior, demonstrating its ability to generalize to unseen tasks and benefit from larger model sizes.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.15536

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback